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DETAILED ACTION 

1 . Claims 1-39 have been examined. 

Papers Submitted 

2. It is hereby acknowledged that the following papers have been received and placed of 
record in the file: Preliminary Amendment as received on 2/13/02, EDS as received on 7/3/02, 
Requests for Corrected Filing Receipt as received on 4/3/02, 5/16/02, and 3/5/04. 

Specification 

3. The title of the invention is not descriptive. A new title is required that is clearly 
indicative of the invention to which the claims are directed. 

4. The lengthy specification has not been checked to the extent necessary to determine the 
presence of all possible minor errors. Applicant's cooperation is requested in correcting any 
errors of which applicant may become aware in the specification. 

Drawings 

5. The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they 
include the following reference character(s) not mentioned in the description: In Fig. 1, reference 
numbers 1 18 and 126 have not been found in the specification. In Fig.2^ reference numbers 200, 
214, and 216 have not been found in the specification. Corrected drawing sheets, or amendment 
to the specification to add the reference character(s) in the description, are required in reply to 
the Office action to avoid abandonment of the application. Any amended replacement drawing 
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sheet should include all of the figures appearing on the immediate prior version of the sheet, 
even if only one figure is being amended. The replacement sheet(s) should be labeled 
"Replacement Sheet" in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion 
of the drawing figures. If the changes are not accepted by the examiner, the applicant will be 
notified and informed of any required corrective action in the next Office action. The objection 
to the drawings will not be held in abeyance. 

6. The drawings (Fig.2) are objected to as failing to comply with 37 CFR 1.84(p)(4) because 
reference characters "204" and "206" have both been used to designate a component labeled with 
an "S". Even if these are different parts, they should be illustrated differently because they 
appear to be the same part in the figure. Corrected drawing sheets are required in reply to the 
Office action to avoid abandonment of the application. Any amended replacement drawing sheet 
should include all of the figures appearing on the immediate prior version of the sheet, even if 
only one figure is being amended. The replacement sheet(s) should be labeled "Replacement 
Sheet" in the page header (as per 37 CFR 1 .84(c)) so as not to obstruct any portion of the 
drawing figures. If the changes are not accepted by the examiner, the applicant will be notified 
and informed of any required corrective action in the next Office action. The objection to the 
drawings will not be held in abeyance. 

7. The drawings (Fig.4) are objected to as failing to comply with 37 CFR 1.84(p)(5) because 
they do not include the following reference character(s) mentioned in the description: Reference 
number 400 appears in the specification but not in Fig.4. Corrected drawing sheets are required 
in reply to the Office action to avoid abandonment of the application. Any amended replacement 
drawing sheet should include all of the figures appearing on the immediate prior version of the 
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sheet, even if only one figure is being amended. The replacement sheet(s) should be labeled 
"Replacement Sheet" in the page header (as per 37 CFR 1.84(c)) so as not to obstruct any portion 
of the drawing figures. If the changes are not accepted by the examiner, the applicant will be 
notified and informed of any required corrective action in the next Office action.. The objection 
to the drawings will not be held in abeyance. 

Claim Objections 

8. The numbering of claims is not in accordance with 37 CFR 1 . 126 which requires the 
original numbering of the claims to be preserved throughout the prosecution. When claims are 
canceled, the remaining claims must not be renumbered. When new claims are presented, they 
must be numbered consecutively beginning with the number next following the highest 
numbered claims previously presented (whether entered or not). 

Misnumbered claims 35-39 have been renumbered as 36-40, respectively. Note that there 
are two claims numbered as 35, and therefore, the second claim 35 becomes claim 36, whereas 
the claims subsequent to claim 36 increase in number by 1. 

9. Claim 8 is objected to because of the following informalities: Remove "a" before "said". 
Appropriate correction is required. 

10. Claim 1 1 is objected to because of the following informalities: In line 5, please replace 
the phrase "operative cause" with -operative to cause-. Appropriate correction is required. 

1 1 . Claim 12 is objected to because of the following informalities: In lines 6-7, please 
replace the phrase "operative cause a plurality data" with -operative to cause a plurality of 
data-. Appropriate correction is required. 
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12. Claim 20 is objected to because of the following informalities: In line 3, please reword 
the phrase "each said register file". Also, in line 6 of the claim (page 49, line 28), please reword 
the phrase "map onto to the registers". Appropriate correction is required. 

13. Claim 25 is objected to because of the following informalities: Please insert —unit— after 
"data assembly" in line 2. Appropriate correction is required. 

14. Claim 28 is objected to because of the following informalities: Replace "at least one of 
said data assembly unit" with -said at least one data assembly unit—. Also, in lines 5-6, replace 
"said data assembly unit" with —said at least one data assembly unit—. Appropriate correction is 
required. 

15. Claim 29 is objected to because of the following informalities: In line 6, please reword 
the phrase "each said register file". Also, in line 5, the phrase "and is parallel coupled" should 
be reworded as it is not completely clear. Finally, in lines 10-1 1 of the claim, please reword the 
phrase "map onto to the registers". Appropriate correction is required. 

16. Claim 36 is objected to because of the following informalities: On page 53, lines 26 and 
27, remove the redundant "and". Appropriate correction is required. 

17. Claim 40 is objected to because of the following informalities: On page 54, line 21, 
replace "first group instructions" with —first group of instructions—. Also, on page 54, line 27, 
replace "load" with —loadable—. Appropriate correction is required. 

18. Due to the length of the application, the examiner asks applicant's cooperation in finding 
all minor errors within the claims. The examiner has done his best in finding most of them, but 
applicant should review the claims and make sure no other errors exist. 
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Claim Rejections - 35 USC § 112 

19. The following is a quotation of the second paragraph of 35 U.S.C. 112: 

The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the 
subject matter which the applicant regards as his invention. 

20. Claims 4, 6-7, 12, and 30-3 1 are rejected under 35 U.S.C. 1 12, second paragraph, as 
being indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

21. Claim 4 recites the limitations "said first sequence of instructions" and "said second 
sequence of instructions". There is insufficient antecedent basis for these limitations in the 
claim. 

22. Claim 6 recites the limitation "said DRAM array" in line 10. There is insufficient 
antecedent basis for this limitation in the claim. 

23. Claim 7 recites the limitation "said other register set" in lines 5-6. There is insufficient 
antecedent basis for this limitation in the claim. 

24. Claim 12 recites the limitation "said register file" in line 8. There is insufficient 
antecedent basis for this limitation in the claim, as applicant has claimed one of more register 
files and so if there are multiple register files, there cannot be "said register file". Claim 12 also 
recites the limitation "said instructions" in line 10. There is insufficient antecedent basis for this 
limitation in the claim, as applicant has claimed a first and a second sequence of instructions. 

25. Claim 30 is rejected under 35 U.S.C. 1 12, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant regards as 
the invention. More specifically, it is not clear how a data assembly unit can comprise an 
instruction set. A data assembly unit would execute instructions from an instruction set. For 
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purposes of this examination, the examiner will interpret the claim as saying the data assembly 
unit executes instructions from an instruction set. 

26. Claim 3 1 is rejected under 35 U.S.C. 1 12, second paragraph, as being indefinite for 
failing to particularly point out and distinctly claim the subject matter which applicant regards as 
the invention. More specifically, it is not clear how a functional unit can comprise an instruction 
set. A functional unit executes instructions from an instruction set. For purposes of this 
examination, the examiner will interpret the claim as saying the functional unit executes 
instructions from an instruction set. 

Claim Rejections - 35 USC §103 

27. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

28. Claims 1, 24, 27, and 35-36 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Parady, U.S. Patent No. 5,933,627 (as disclosed by applicant), in view of Inagami et al., 
U.S. Patent No. 4,881,168 (as disclosed by applicant and herein referred to as Inagami). 

29. Referring to claim 1, Parady has taught a method comprising the steps of: 

a) segmenting the architecture into first and second portions. It should be noted that any group 
of components within Parady may perform a portion. Therefore, first and second portions would 
exist. 
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b) executing instructions by said first portion which manipulate only register operands. See 
Fig. 1 and note that components 38 and 40, for instance, may be considered part of a first portion. 
These two components would execute floating-point addition, subtraction, and multiplication 
instructions, which manipulate register only operands. 

c) Parady has taught executing instructions by said second portion which perform row-oriented 
load/store operations (see Fig. 1, component 32, and note that the load/store unit inherently 
performs load/store instructions from/to rows of memory). Parady has not explicitly taught 
individual register-to-register move operations. However, Official Notice is taken the register- 
register move operations are well known and expected in the art. These operations at the very 
least allow for the copying of registers. Consequently, it would have been obvious to one of 
ordinary skill in the art at the time of the invention to implement register-register move 
operations. 

d) said first portion of said architecture sees a subset of the total available registers as its set of 
architectural registers. See Fig.3, components 48 and 50, and note that only one thread is in 
execution at any given time. Consequently, only that thread's corresponding registers are 
available to the portion. These corresponding registers are merely a subset of all available 
registers. 

e) said first portion of said architecture comprises one or more functional units which execute a 
first program comprising instructions using register operands. Again, see Fig. 1 and note that 
components 38 and 40, for instance, may be considered part of a first portion. 

f) Parady has not taught that said second portion of said architecture executes a second program 
tightly coupled to said first program, said second program comprising parallel row-oriented 
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load/store/mask commands. However, Inagami has taught performing such operations. See 
Fig. 2, 3 A, 3B, 4, and 5. And, as disclosed in column 1, lines 42-48, having such instructions 
allow the loading/storing of certain data elements, thereby allowing for more efficient processing 
of some programs. Consequently, it would have been obvious to one of ordinary skill in the art 
at the time of the invention to modify Parady to include these row-oriented load/store/mask 
commands. Furthermore, Parady has explicitly taught register-to-register move commands but 
as described above, this would have been an obvious modification. Finally, Parady has taught 
architectural register set switch commands to insure that data accessed by said first program is 
available when it is needed. See the abstract. Note that when a particular instruction misses a 
cache, for instance, a thread switch will occur, thereby causing the next thread's register file to 
become active (which insures that data accessed by that thread is available when needed). 
Referring to claim 24, Parady has taught a processor as described in claim 21. Parady has not 
taught that the data assembly further comprises a bit mask to select one or more data locations 
within at least one of said register sets and an instruction set which comprises a command to load 
a set of selected elements of a row said DRAM array into a selected set of data registers, said 
selection based on at least one bit in said bit mask. However, Inagami has taught performing 
such operations. See Fig.2, 3 A, 3B, 4, and 5. And, as disclosed in column 1, lines 42-48, having 
such instructions allow the loading/storing of certain data elements, thereby allowing for more 
efficient processing of some programs. Consequently, it would have been obvious to one of 
ordinary skill in the art at the time of the invention to modify Parady to include these row- 
oriented load/store/mask commands. 
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30. Referring to claim 27, Parady has taught a processor as described in claim 2 1 . Parady has 
not taught a mask and switch unit interposed between said DRAM array and at least one of said 
functional units. However, Inagami has taught such a unit which performs load/store/mask 
operations. See Fig.2, 3A, 3B, 4, and 5. As disclosed in column 1, lines 42-48, having such a 
unit and instructions allows the loading/storing of certain data elements, thereby allowing for 
more efficient processing of some programs. Consequently, it would have been obvious to one 
of ordinary skill in the art at the time of the invention to modify Parady to include a mask/switch 
unit for performing these row-oriented load/store/mask commands. 

31. Referring to claim 35, Parady has taught a method as described in claim 33. Parady has 
not taught that said speculative loading is further controlled by a bit mask that identifies a 
selected subset of said register file to be speculatively loaded. However, Inagami has taught 
loading based on a mask. See Fig.2, 3 A, 3B, 4, and 5. As disclosed in column 1, lines 42-48, 
having such a unit and instructions allows the loading/storing of certain data elements, thereby 
allowing for more efficient processing of some programs. Consequently, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to modify Parady to include 
a mask unit for performing these row-oriented load/mask commands. 

32. Referring to claim 36, Parady has taught a method as described in claim 33. Parady has 
not taught that said speculative loading is further processed via a mask and switch unit whereby a 
selected subset of said register file is speculatively loaded with a data element order permutation. 
However, Inagami has taught such a unit which performs load/store/mask operations. See Fig.2, 
3 A, 3B, 4, and 5. As disclosed in column 1, lines 42-48, having such a unit and instructions 
allows the loading/storing of certain data elements, thereby allowing for more efficient 
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processing of some programs. Consequently, it would have been obvious to one of ordinary skill 
in the art at the time of the invention to modify Parady to include a mask/switch unit for 
performing these row-oriented load/store/mask commands. 

33. Claim 2 is rejected under 35 U.S.C. 103(a) as being unpatentable over Hayashi et al., 
U.S. Patent No. 5,237,702 (herein referred to as Hayashi), in view of Jager, U.S. Patent No. 
5,423,048. 

34. Referring to claim 2, Hayashi has taught a method for intelligent caching comprising the 
steps of: 

a) splitting an architecture into first and second portions, said first portion comprising a set of 
functional units and a set of architectural registers exercised thereby (see column 1, lines 24-28, 
and note the first portion would be an arithmetical portion which may execute add and multiply 
instructions with register operands), said second portion comprising at least one functional unit 
capable of moving data between a main memory and said set of architectural registers (see 
column 1, lines 24-32, and note the second portion would be a memory portion which allows 
data to be transferred to registers in response to a load instruction, for instance). 

b) splitting a single program into first and second portions, said first portion of said program 
executed on said first portion of the architecture (all arithmetic operations will be performed in 
the first portion), said second portion of said program executed on said second portion of said 
architecture (all memory operations will be performed in the second portion). 

c) whereby said second portion of said architecture is operative to prefetch data into said 
architectural registers prior to being processed by said first portion of said architecture. See the 
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title and claim 1 of Hayashi and note that vector data is prefetched and ultimately written to 
vector registers. 

d) Although, Hayashi has not explicitly taught that said second portion of said architecture is 
operative to move results produced by said first portion of said architecture into main memory 
after they are produced by said first portion of said architecture, Official Notice is taken that 
store instructions are well known and expected in the art. They allow data to be moved from 
registers into memory where they may be retrieved at a later time when needed. By doing this, 
the registers may be freed up to perform additional operations until that stored data is required 
again. Consequently, it would have been obvious to one of ordinary skill in the art at the time of 
the invention to modify Hayashi to include store instructions. 

e) Hayashi has not taught that prior to when said first portion of said architecture executes a 
conditional branch instruction, said second portion of said architecture prefetches first and 
second data sets from memory into said architectural registers, said first data set being needed 
when said condition evaluates to true, said second data set being needed when said condition 
evaluates to false. However, Jager has taught that it is beneficial to prefetch along both branch 
paths because the required data will be available regardless of the final direction of the branch. 
This is contrary to prefetching along a single path, where if the path turns out to be the wrong 
path, the prefetching was not helpful and efficiency is reduced. See the abstract and column 1, 
line 50, to column 2, line 27. Consequently, it would have been obvious to one of ordinary skill 
in the art at the time of the invention to modify Hayashi to include prefetching along both paths 
of a branch (which are well known instructions) so that data will always be available. 
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35. Claim 3 is rejected under 35 U.S.C. 103(a) as being unpatentable over Inagami, as 
applied above, in view of Hayashi, as applied above, and further in view of Jager, as applied 
above. 

36. Referring to claim 3, Inagami has taught in a processor comprising a plurality of memory 
arrays comprising rows and columns of random access memory cells (Fig. 1, component 1), a set 
of functional units which execute a first program (Fig. 1, component 5), and a data assembly unit 
which executes a second program (Fig.l, components 2-0, 2-1, etc.), said second program being 
tightly coupled with said first program, and whereby said data assembly unit is operative to load 
and store a plurality of data elements from a memory row to or from one or more register files 
(Fig. 1, components VR0, VR1,. . . .VR7) which each include a parallel access port, a method of 
intelligent caching comprising the steps of: 

a) executing a first sequence of instructions on said set of functional units, said functional units 
operative to process data stored in said register files. See column 4, line 63, to column 5, line 1. 
note that arithmetic instruction would be executed here. 

b) executing a second sequence of instructions on said data assembly unit, said data assembly 
unit operative to transfer data between said register files and main memory. See Fig. 1 and note 
that data is transferred between registers and memory via the load/store pipes (data assembly 
unit). 

c) Inagami has not taught that said second sequence of instructions instructs said data assembly 
unit to prefetch data into said register files from said DRAM arrays via said parallel access port. 
However, Hayashi has taught such a concept. See the title and claim 1 of Hayashi and note that- 
vector data is prefetched and ultimately written to vector registers. Clearly, prefetching is 
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beneficial because by bringing data into the system from memory before instructions requiring 
that data are executed, once the instruction does in fact need to execute, the data is available 
immediately, thereby preventing a time delay. Consequently, it would have been obvious to one 
of ordinary skill in the art at the time of the invention to modify Inagami to include prefetching 
capability. 

d) Inagami in view of Hayashi has not taught that when conditional logic in said first program 
makes it uncertain as to which data will next be needed by said functional units executing said 
first sequence of instructions, said second sequence of instructions instructs said data assembly 
unit to prefetch time-critical data so that irrespective of the conditional outcome in processing 
said first sequence of instructions, the required data will be present in said registers. However, 
Jager has taught that it is beneficial to prefetch along both branch paths because the required data 
will be available regardless of the final direction of the branch. This is contrary to prefetching 
along a single path, where if the path turns out to be the wrong path, the prefetching was not 
helpful and efficiency is reduced. See the abstract and column 1, line 50, to column 2, line 27. 
Consequently, it would have been obvious to one of ordinary skill in the art at the time of the 
invention to modify Inagami in view of Hayashi to include prefetching along both paths of a 
branch (which are well known instructions) so that data will always be available. 

e) Inagami has not explicitly taught that the memory array is a DRAM array. However, Official 
Notice is taken that DRAM and its advantages are well known and expected in the art. More 
specifically, DRAM is a very popular memory technology because of its high density and low 
price (in comparison to other memory such as SRAM). Therefore, it would have been obvious to 
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one of ordinary skill in the art at the time of the invention to modify Inagami's main storage to 
be a DRAM array. 

37. Claims 4, 12, 20, and 29-32 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Parady, as applied above, in view of Bissett et al., U.S. Patent No. 5,896,523 (herein 
referred to as Bissett). 

38. Referring to claim 4, Parady has taught in a processor comprising a plurality of memory 
arrays (Fig. 2, main memory) which comprise rows and columns of random access memory cells, 
a set of functional units that execute a first program (see Fig. 1, components 38, 40, 42, for 
instance), and a data assembly unit that executes a second program (see Fig. 1, component 32, for 
instance), said second program being tightly coupled with said first program, and whereby said 
data assembly unit is operative to load and store a plurality of data elements from a DRAM row 
to or from one or more register files which each include a parallel access port (see Fig.l, 
components 48 and 40 and note the multiple access ports for parallel access), and a selector 
switch operative to include or remove a register file from the architectural register set of said 
functional units executing said first sequence of instructions (see the abstract, column 2, lines 35- 
37, and Fig. 3, component 1 12, and note that when a thread is switched out, it's corresponding 
register file becomes inactive (removed), a method of intelligent caching comprising the steps of: 
a) executing said first sequence of instructions on said functional units, whereby said instructions 
involve operands, and said operands correspond to architectural registers visible to said 
functional units. Clearly, the floating-point functional units will execute floating-point 
instructions using the floating-point registers as operands. 
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b) executing said second sequence of instructions on said data assembly unit. Loads and stores 
are executed on the data assembly unit (load/store unit). 

c) Parady has not taught that said execution of said second sequence of instructions is operative 
to prefetch information into one or more register files which are not architectural registers visible 
to said functional units. However, Bissett has taught performing data prefetches in the 
background. And, this is beneficial because background operations do not influence software 
execution, thereby allowing the current program to run normally while also accomplishing a task 
in the background. See column 4, lines 3-8. And, as is known in the art, prefetching is 
beneficial because data is brought in from memory before an instruction needs it. Then, when 
the instruction does actually execute, the data is already fetched, allowing the instruction to 
execute more quickly. Consequently, it would have been obvious to one of ordinary skill in the 
art at the time of the invention to modify Parady in view of Bissett such that Parady performs 
background prefetching to a register file is not currently in use (one that corresponds to an 
inactive thread). 

d) in response to progress made in said first program, said data assembly unit executing one or 
more instructions which transform said one or more register files which received prefetched data 
into architectural register files visible to said functional units and transform current architectural 
register files into non-architectural register files which are inaccessible to said functional units. 
As described above, when an instruction in a first thread missed the cache (see the abstract), a 
thread is switched, thereby also causing a new register file to become active. 

e) Parady has not explicitly taught that the memory array is a DRAM array. However, Official 
Notice is taken that DRAM and its advantages are well known and expected in the art. More 
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specifically, DRAM is a very popular memory technology because of its high density and low 
price (in comparison to other memory such as SRAM). Therefore, it would have been obvious to 
one of ordinary skill in the art at the time of the invention to modify Parady's main storage to be 
a DRAM array. 

39. Referring to claim 12, Parady has taught in a processor comprising a plurality of memory 
arrays (Fig. 2, main memory) which comprise rows and columns of random access memory cells, 
at least one functional unit that executes a first sequence of instructions (see Fig. 1, components 
38, 40, 42, for instance), and a data assembly unit that executes a second sequence of instructions 
(see Fig. 1, component 32, for instance), said second sequence of instructions being tightly 
coupled with said first sequence of instructions, and whereby said data assembly unit is operative 
to cause a plurality of data elements to be transferred in parallel between a memory row and one 
or more register files via a parallel access port in said register file (see Fig. 1, components 48 and 
40 and note the multiple access ports for parallel access), a method of intelligent caching 
comprising the steps of: 

a) executing said first sequence of instructions on said at least one functional unit, whereby said 
instructions involve operands, and said operands correspond to architectural registers visible to 
said at least one functional unit. Clearly, the floating-point functional units will execute floating- 
point instructions using the floating-point registers as operands. 

b) executing said second sequence of instructions on said data assembly unit. Loads and stores 
are executed on the data assembly unit (load/store unit). 

c) Parady has not taught that said execution of said second sequence of instructions is operative 
to prefetch information into one or more register files which are not architectural registers visible 
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to said functional units. However, Bissett has taught performing data prefetches in the 
background. And, this is beneficial because background operations do not influence software 
execution, thereby allowing the current program to run normally while also accomplishing a task 
in the background. See column 4, lines 3-8. And, as is known in the art, prefetching is 
beneficial because data is brought in from memory before an instruction needs it. Then, when 
the instruction does actually execute, the data is already fetched, allowing the instruction to 
execute more quickly. Consequently, it would have been obvious to one of ordinary skill in the 
art at the time of the invention to modify Parady in view of Bissett such that Parady performs 
background prefetching to a register file is not currently in use (one that corresponds to an 
inactive thread). It can be seen that the execution of the second sequence would cause this 
prefetching because while the second sequence is executing, prefetching should occur for an 
inactive thread. If the second sequence belongs to thread 1, then prefetching for a thread other 
than thread 1 should occur. 

d) executing one or more instructions which transform at least one of said one or more register 
files into an architectural register file visible to said at least one functional unit. As described 
above, when an instruction in a first thread missed the cache (see the abstract), a thread is 
switched, thereby also causing a new register file to become active. 

e) Parady has not explicitly taught that the memory array is a DRAM array. However, Official 
Notice is taken that DRAM and its advantages are well known and expected in the art. More 
specifically, DRAM is a very popular memory technology because of its high density and low 
price (in comparison to other memory such as SRAM). Therefore, it would have been obvious to 



Application/Control Number: 10/074,705 Page 19 

Art Unit: 2183 

one of ordinary skill in the art at the time of the invention to modify Parady' s main storage to be 
a DRAM array. 

40. Referring to claim 20, Parady has taught a processor as described in claim 19. Parady has 
further taught: 

a) each said register file is capable of being placed into an active state and an inactive state. 
From column 2, lines 25-39 and Fig.3 of Parady, it should be realized that each thread has its 
own register file. Consequently, when a thread is running, its corresponding register file is the 
only active register file; the rest are inactive. 

b) said functional unit is responsive to commands involving architectural register operands that 
map onto to the registers within a register file that is in the active state. Clearly, the system of 
Fig. 1 would operate in such a manner. For instance, component 34 of Fig. 1 could execute a 
multiplication of two operands (retrieved from the register file). And, the current instructions 
will be of a current thread which has a corresponding active register file. 

c) Parady has not taught that said selected one of said register files is in the inactive state (i.e., 
Parady has not taught that prefetching has occurred for an inactive register file). However, 
Bissett has taught performing data prefetches in the background. And, this is beneficial because 
background operations do not influence software execution, thereby allowing the current 
program to run normally while also accomplishing a task in the background. See column 4, lines 
3-8. And, as is known in the art, prefetching is beneficial because data is brought in from 
memory before an instruction needs it. Then, when the instruction does actually execute, the 
data is already fetched, allowing the instruction to execute more quickly. Consequently, it would 
have been obvious to one of ordinary skill in the art at the time of the invention to modify Parady 



Application/Control Number: 10/074,705 Page 20 

Art Unit: 2183 

in view of Bissett such that Parady performs background prefetching to a register file is not 
currently in use (one that corresponds to an inactive thread). 

41. Referring to claim 29, Parady has taught an intelligent-cache based processor comprising: 

a) a memory array comprising a plurality of random access memory cells. See Fig. 2 and note 
that the system is coupled to main memory which inherently has rows and columns of memory 
cells: Parady has not explicitly taught that the memory array is a DRAM array. However, 
Official Notice is taken that DRAM and its advantages are well known and expected in the art. 
More specifically, DRAM is a very popular memory technology because of its high density and 
low price (in comparison to other memory such as SRAM). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to modify Parady' s main 
storage to be a DRAM array. 

b) first and second dual-port registers files, whereby the first port of each of said register files is 
a parallel access port and is parallel coupled to said DRAM array, each said register file capable 
of being placed into an active state and an inactive state. See Fig. 1, and note that each register 
file 48 and 50 is a dual port file (has port for reading and port for writing). The first port would 
be the write port which is coupled to memory for loading data into the file. It would also be a 
parallel access port because the write port is in fact 3 write ports, according to Fig. 1, so 3 reads 
may occur in parallel. The second port would be the read port so that functional units may read 
data and operate on it. Finally, from column 2, lines 25-39 and Fig.3 of Parady, it should be 
realized that each thread has its own register file. Consequently, when a thread is running, its 
corresponding register file is the only active register file; the rest are inactive. 
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c) at least one functional unit that executes a first program, said functional unit coupled to said 
second port of said register files, said functional unit responsive to commands involving 
architectural register operands that map onto to the registers within a register file that is in the 
active state. Clearly, the system of Fig. 1 would operate in such a manner. For instance, 
component 34 of Fig. 1 could execute a multiplication of two operands (retrieved from the 
register file) and then write the result to the register file via the second port. And, the current 
instructions will be of a current thread which has a corresponding active register file. 

d) Parady has not explicitly taught a data assembly unit that executes an intelligent caching 
program in support of said first program, said data assembly unit responsive to at least one 
command that causes data to be moved between the DRAM array and a register file that is in the 
inactive state. However, Bissett has taught performing data prefetches in the background. And, 
this is beneficial because background operations do not influence software execution, thereby 
allowing the current program to run normally while also accomplishing a task in the background. 
See column 4, lines 3-8. And, as is known in the art, prefetching is beneficial because data is 
brought in from memory before an instruction needs it. Then, when the instruction does actually 
execute, the data is already fetched, allowing the instruction to execute more quickly. 
Consequently, it would have been obvious to one of ordinary skill in the art at the time of the 
invention to modify Parady in view of Bissett such that Parady performs background prefetching 
to a register file is not currently in use (one that corresponds to an inactive thread). 

e) whereby the first and second register files are capable of toggling between said active and 
inactive states, under program control, during program execution. As described in column 2, 
lines 25-39, each thread has its own register file (Fig.3). And, only one thread is active at a time, 
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where threads switch due to a long latency event. See the abstract. Consequently, if thread 1 is 
executing, then thread 1 's register file will be in the active state while the other thread's register 
files will be in the inactive state. However, when thread 1 becomes inactive, so will its register 
file and another thread and its register file will become active. 

42. Referring to claim 30, Parady in view of Bissett has taught a processor as described in 
claim 29. Parady has further taught a control coupling between said data assembly unit and said 
register files, whereby said data assembly comprises an instruction set which comprises a 
command to cause at least one of the register files to toggle between said active and said inactive 
state. See the abstract and Fig.3 of Parady and note that in response to a load instruction missing 
the cache, a thread switch will occur. 

43. Referring to claim 31, Parady in view of Bissett has taught a processor as described in 
claim 29. Parady has further taught a control coupling between said functional unit and said 
register files, whereby said functional unit comprises an instruction set which comprises a 
command to cause at least one of the register files to toggle between said active and said inactive 
state. 

44. Referring to claim 32, Parady in view of Bissett has taught a processor as described in 
claim 29. Parady has further taught: 

a) said functional unit is responsive to an instruction set, and instructions within said instruction 
set comprise commands exclusively responsive to register operands, said register operands 
corresponding to a set of architectural registers. See Fig. 1, and note for instance that component 
34 will execute multiply and divide instructions. These instructions typically include only 
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register operands and are in the form of MULT R3, R2, Rl (multiply Rl and R2 and store the 
result in R3). 

b) said data assembly unit is responsive to an instruction set, and instructions within said 
instruction set comprise: 

(i) Parady has not taught a command to parallel load at least a portion of an inactive 
register set from said DRAM array. However, Bissett has taught performing data 
prefetch commands in the background. And, this is beneficial because background 
operations do not influence software execution, thereby allowing the current program to 
run normally while also accomplishing a task in the background. See column 4, lines 3- 
8. And, as is known in the art, prefetching is beneficial because data is brought in from 
memory before an instruction needs it. Then, when the instruction does actually execute, 
the data is already fetched, allowing the instruction to execute more quickly. 
Consequently, it would have been obvious to one of ordinary skill in the art at the time of 
the invention to modify Parady in view of Bissett such that Parady performs background 
prefetching to a register file is not currently in use (one that corresponds to an inactive 
thread). 

(ii) a command to toggle said inactive register set into said active state and at the same 
time to toggle an active register set into said inactive state, whereby said architectural 
register set of said functional unit is dependent on the execution of said toggle command. 
From the abstract, it should be noted that a load instruction would perform such a task. 
That is, when a load causes a cache miss, the current active thread and register file 
become inactive while another thread and its register file become active. Once the 
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inactive register file becomes active, it becomes the architectural file which is used in 
execution until the next switch. 

45. Claims 5, 7-10, and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Parady in view of Bissett, as applied above, and further in view of Jager, as applied above. 

46. Referring to claim 5, Parady in view of Bissett has taught a method as described in claim 
4. Parady in view of Bissett has not taught the step of speculatively prefetching information 
needed by two or more execution paths when a conditional branch in said first instruction 
sequence makes it ambiguous as to which data will next be needed by said functional units. 
However, Jager has taught that it is beneficial to prefetch along both branch paths because the 
required data will be available regardless of the final direction of the branch. This is contrary to 
prefetching along a single path, where if the path turns out to be the wrong path, the prefetching 
was not helpful and efficiency is reduced. See the abstract and column 1, line 50, to column 2, 
line 27. Consequently, it would have been obvious to one of ordinary skill in the art at the time 
of the invention to modify Parady in view of Bissett to include prefetching along both paths of a 
branch (which are well known instructions) so that data will always be available. 

47. Referring to claim 7, Parady has taught a processor that is segmented into first and 
second portions, said first portion comprising a set of functional units and a set of architectural 
registers accessed thereby (see Fig. 1, components 34, 36, and 48, and note that these components 
may be considered a first portion), said second portion comprising at least one other functional 
unit and a first inactive register set (see Fig. 1, component 32, Fig.3, component 48, and note that 
each thread has a corresponding register file and when the thread is inactive, its register file is 



Application/Control Number: 10/074,705 Page 25 

Art Unit: 2183 

inactive), said other functional unit capable of moving data between a row of an array and said 
other register set (the other functional unit 32 is a load/store unit which loads registers with data 
from a memory array), a method of intelligent caching comprising: 

a) splitting a single program into first and second portions, said first portion of said program 
executed on said first portion of the architecture, said second portion of said program executed 
on said second portion of said architecture. Note that integer ALU instructions would be 
executed by the first portion and load/store instructions would be executed by the second portion. 

b) Parady has not taught that said second portion of said architecture is operative to prefetch data 
into said first inactive register set in anticipation of data requirement a by said first portion of 
said architecture. However, Bissett has taught performing data prefetches in the background. 
And, this is beneficial because background operations do not influence software execution, 
thereby allowing the current program to run normally while also accomplishing a task in the 
background. See column 4, lines 3-8. And, as is known in the art, prefetching is beneficial 
because data is brought in from memory before an instruction needs it. Then, when the 
instruction does actually execute, the data is already fetched, allowing the instruction to execute 
more quickly. Consequently, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify Parady in view of Bissett such that Parady performs background 
prefetching to a register file is not currently in use (one that corresponds to an inactive thread). 

c) Parady in view of Bissett has not taught that prior to when said first portion of said 
architecture executes a conditional branch instruction, said second portion of said architecture 
prefetches first and second data sets from memory, said first data set being needed when said 
condition evaluates to true, said second data set being needed when said condition evaluates to 
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false. However, Jager has taught that it is beneficial to prefetch along both branch paths because 
the required data will be available regardless of the final direction of the branch. This is contrary 
to prefetching along a single path, where if the path turns out to be the wrong path, the 
prefetching was not helpful and efficiency is reduced. See the abstract and column 1, line 50, to 
column 2, line 27. Consequently, it would have been obvious to one of ordinary skill in the art at 
the time of the invention to modify Parady in view of Bissett to include prefetching along both 
paths of a branch (which are well known instructions) so that data will always be available, 
d) Finally, Parady has not explicitly taught that the memory array is a DRAM array. However, 
Official Notice is taken that DRAM and its advantages are well known and expected in the art. 
More specifically, DRAM is a very popular memory technology because of its high density and 
low price (in comparison to other memory such as SRAM). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to modify Parady' s main 
storage to be a DRAM array. 

48. Referring to claim 8, Parady in view of Bissett and further in view of Jager has taught a 
method as described in claim 7. Furthermore: 

a) it is inherent that the prefetching operation comprises executing first an instruction that causes 
a first row of a DRAM array to be loaded into said first inactive register set. Clearly, prefetches 
do not occur by themselves. Instead, the system must be instructed to perform a prefetch. 

b) executing a second instruction that causes a second row of said DRAM array to be loaded into 
a second inactive register set. This is also inherent, as prefetches do not occur by themselves. 
Instead, the system must be instructed to perform a prefetch. 
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c) checking a condition and in response to the checking, executing a command that causes a 
selected one of said first and second inactive register sets to be activated to an become an 
architectural register visible to said first portion of said program. See the abstract and Fig.3, 
component 1 14, of Parady, and note that when a cache miss occurs (condition), a thread switch 
occurs, thereby making the current register set inactive and making an inactive register set active 
(along with its thread). 

49. Referring to claim 9, Parady in view of Bissett and further in view of Jager has taught a 
method as described in claim 7. Furthermore: 

a) it is inherent that the prefetching operation comprises executing first an instruction that causes 
a first row of a DRAM array to be loaded into said first inactive register set. Clearly, prefetches 
do not occur by themselves. Instead, the system must be instructed to perform a prefetch. 

b) executing a second instruction that causes a second row of said DRAM array to be loaded into 
a second inactive register set. This is also inherent, as prefetches do not occur by themselves. 
Instead, the system must be instructed to perform a prefetch. 

c) checking a condition and in response to the checking, executing a command that causes said 
architectural register set to assume an inactive state and a selected one of said first and second 
inactive register sets to be activated to become architectural registers visible to said first portion. 
See the abstract and Fig.3, component 1 14, of Parady, and note that when a cache miss occurs 
(condition), a thread switch occurs, thereby making the current register set inactive and making 
an inactive register set active (along with its thread). 

50. Referring to claim 10, Parady in view of Bissett and further in view of Jager has taught a 
method as described in claim 7. Parady in view of Bissett and further in view of Jager has not 
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explicitly taught individual register-to-register move operations. However, Official Notice is 
taken the register-register move operations are well known and expected in the art. These 
operations at the very least allow for the copying of registers. Consequently, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to implement register- 
register move operations. Also, recall that Bissett has taught prefetching in the background. 
This is useful so that data will be available immediately to an instruction upon execution. The 
register-move operation is similar to prefetching in that data is moved to a register ahead of time 
so that it will be ready for an instruction. 

5 1 . Referring to claim 13, Parady in view of Bissett has taught a method as described in 
claim 12. Furthermore, claim 13 is rejected for the same reasons set forth in the rejection of 
claim 5. 

52. Claims 6, 14-19, 21-23, 25-26, 28, 33-34, and 37-40 are rejected under 35 U.S.C. 103(a) 
as being unpatentable over Parady, as applied above. 

53. Referring to claim 6, Parady has taught in a processor that is segmented into first portion 
comprising at least one functional unit (Fig.l, components 38, 40, 42, etc.) and second portion 
comprising at least one other functional unit (Fig. 1, component 32), a method of intelligent 
caching comprising: 

a) in said first portion, executing a first program comprising instructions that manipulate 
architectural register operands. Note the floating-point operations are executing by the first 
portion, which includes floating-point functional units. Also, note that they access floating-point 
registers 50. 
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b) in said second portion, executing a second program tightly coupled to said first program, said 
second program comprising an architectural register set switch command and at least one parallel 
data transfer command that causes data to be transferred between a parallel-loadable register file 
and a row of said memory. Note from the abstract that load instructions are the switch 
commands. That is, loads are actually "load, if cache-miss, switch threads" instructions. Also, 
load/store unit 32 clearly executes load instructions which load the parallel loadable register file 
(note that files 48 and 50 in Fig.l have multiple write ports so that multiple registers may be 
loaded). 

c) wherein said second program monitors at least one bit of information generated during 
execution of said first program and executes said architectural register set switch command and 
said parallel data transfer command in support of said first program. Note that the system will 
monitor the L2 cache miss signal 1 14 (Fig.3). If there is a miss, then the switch is actually 
executed. Also, the parallel data transfer (load) will ultimately be performed. Applicant has not 
specified when the instruction is performed, so clearly, if a load is to be performed, it will be 
performed at some point. 

d) Parady has not explicitly taught that the memory array is a DRAM array. However, Official 
Notice is taken that DRAM and its advantages are well known and expected in the art. More 
specifically, DRAM is a very popular memory technology because of its high density and low 
price (in comparison to other memory such as SRAM). Therefore, it would have been obvious to 
one of ordinary skill in the art at the time of the invention to modify Parady' s main storage to be 
a DRAM array. 

54. Referring to claim 14, Parady has taught an intelligent cache-based processor comprising: 
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a) a memory array comprising a plurality of random access memory cells. See Fig. 2 and note 
that the system is coupled to main memory which inherently has rows and columns of memory 
cells. Parady has not explicitly taught that the memory array is a DRAM array. However, 
Official Notice is taken that DRAM and its advantages are well known and expected in the art. 
More specifically, DRAM is a very popular memory technology because of its high density and 
low price (in comparison to other memory such as SRAM). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to modify Parady' s main 
storage to be a DRAM array. 

b) at least one functional unit. See Fig. 1, components 34-46. 

c) at least one data assembly unit. See Fig. 1, component 32. 

d) Parady has taught that said at least one functional unit executes a first program using 
instructions (functional units execute instructions), but has not taught that said data assembly unit 
executes an intelligent caching program that causes at least one row of said DRAM array to be 
speculatively precharged in support of the first program. However, Official Notice is taken that 
prefetching and its advantages are well known and expected in the art. More specifically, 
prefetching allows for the retrieval of data from memory before it is needed by an instruction. 
Consequently, when the instruction finally needs the data, it has already been fetched and is 
therefore ready immediately. This results in faster execution and higher throughput. And, it is 
inherent that a DRAM must be precharged before it is read. Similarly, if a prefetch from DRAM 
is performed, then the DRAM must be speculatively precharged. As a result, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to modify Parady to 
speculatively precharge and then prefetch from DRAM so that performance may be increased. 
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55. Referring to claim 15, Parady has taught a processor as described in claim 14. Parady has 
further taught: 

a) first and second dual-port registers files, whereby the first port of each of said register files is a 
parallel access port and coupled in parallel to said DRAM array, and the second port of each 
respective register file is coupled said functional unit. See Fig. 1, and note that each register file 
48 and 50 is a dual port file (has port for reading and port for writing). The first port would be 
the write port which is coupled to memory for loading data into the file. It would also be a 
parallel access port because the write port is in fact 3 write ports, according to Fig. 1, so 3 reads 
may occur in parallel. The second port would be the read port so that functional units may read 
data and operate on it. 

b) said at least one functional unit is switchably coupled to said register files and said at least one 
functional unit executes at least one command to operate on one or more architectural register 
operands that map onto registers within at least one of said register files. See Fig. 3, components 
48 and 50, and column 2, lines 25-39, each thread has its own register file. Consequently, when 
a first thread is active, its register file is also active, while the others are inactive. However, 
when a second thread becomes active and the first thread becomes inactive due to a switch (see 
the abstract), the second thread's register file will become active while the rest are inactive. 

56. Referring to claim 16, Parady has taught a processor as described in claim 15. Parady has 
further taught that the data assembly unit is responsive to an instruction set that comprises a 
command to transfer data in parallel between a row of said DRAM array and a selected one of 
said register files. The data assembly unit (Fig. 1, component 32) is a load/store unit which is 
responsive to load/store instructions. Load instructions are used to load a register file, and an 



Application/Control Number: 10/074,705 Page 32 

Art Unit: 2183 

entire row will be loaded in parallel as each row would comprise multiple bits of data and each 
bit is written in parallel. 

57. Referring to claim 17, Parady has taught a processor as described in claim 16. Parady has 
further taught that said command to transfer data in parallel between a row of said DRAM array 
and a selected one of said register files transfers data to or from said speculatively precharged 
DRAM row. Recall that it would have been obvious to prefetch within Parady because it 
increases performance. A prefetch is nothing more than a speculative load (the command). 
Consequently, during a prefetch, data would be transferred from the speculatively charged 
DRAM row (recall it is inherent that a DRAM must be precharged prior to reading). 

58. Referring to claim 18; Parady has taught a processor as described in claim 17. Parady has 
further taught that said command to transfer data in parallel between a row of said DRAM array 
and a selected data register file is executed to speculatively prefetch a said DRAM row into a 
selected one of said register files. Clearly, a prefetch must be executed (performed), and the 
prefetch will only occur for a selected one of the register files as only one register file is active at 
any given time. 

59. Referring to claim 19, Parady has taught an intelligent cache-based processor comprising: 
a) a memory array comprising a plurality of random access memory cells. See Fig.2 and note 
that the system is coupled to main memory which inherently has rows and columns of memory 
cells. Parady has not explicitly taught that the memory array is a DRAM array. However, 
Official Notice is taken that DRAM and its advantages are well known and expected in the art. 
More specifically, DRAM is a very popular memory technology because of its high density and 
low price (in comparison to other memory such as SRAM). Therefore, it would have been 
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obvious to one of ordinary skill in the art at the time of the invention to modify Parady's main 
storage to be a DRAM array. 

b) at least one functional unit. See Fig. 1, components 34-46. 

c) at least one data assembly unit. See Fig. 1, component 32. 

d) first and second dual-port registers files, whereby the first port of each of said register files is 
a parallel access port and coupled in parallel to said DRAM array, and the second port of each 
respective register file is coupled said functional unit. See Fig. 1, and note that each register file 
48 and 50 is a dual port file (has port for reading and port for writing). The first port would be 
the write port which is coupled to memory for loading data into the file. It would also be a 
parallel access port because the write port is in fact 3 write ports, according to Fig. 1, so 3 reads 
may occur in parallel. The second port would be the read port so that functional units may read 
data and operate on it. 

e) Parady has taught that said at least one functional unit executes a first program using 
instructions (functional units execute instructions), but has not taught that said data assembly unit 
executes an intelligent caching program that causes at least one row of said DRAM array to be 
speculatively prefetched into a selected one of said register files in support of the first program. 
However, Official Notice is taken that prefetching and its advantages are well known and 
expected in the art. More specifically, prefetching allows for the retrieval of data from memory 
before it is needed by an instruction. Consequently, when the instruction finally needs the data, 

it has already been fetched and is therefore ready immediately. This results in faster execution 
and higher throughput. As a result, it would have been obvious to one of ordinary skill in the art 
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at the time of the invention to modify Parady to speculatively precharge and then prefetch from 
the DRAM so that performance may be increased. 

60. Referring to claim 21, Parady has taught an intelligent cache-based processor comprising: 

a) a memory array comprising a plurality of random access memory cells. See Fig.2 and note 
that the system is coupled to main memory which inherently has rows and columns of memory 
cells. Parady has not explicitly taught that the memory array is a DRAM array. However, 
Official Notice is taken that DRAM and its advantages are well known and expected in the art. 
More specifically, DRAM is a very popular memory technology because of its high density and 
low price (in comparison to other memory such as SRAM). Therefore, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to modify Parady' s main 
storage to be a DRAM array. 

b) at least one functional unit. See Fig. 1, components 34-46. 

c) at least one data assembly unit. See Fig. 1, component 32. 

d) first and second dual-port registers files, whereby the first port of each of said register files is 
a parallel access port and coupled in parallel to said DRAM array, and the second port of each 
respective register file is coupled said functional unit. See Fig. 1, and note that each register file 
48 and 50 is a dual port file (has port for reading and port for writing). The first port would be 
the write port which is coupled to memory for loading data into the file. It would also be a 
parallel access port because the write port is in fact 3 write ports, according to Fig. 1, so 3 reads 
may occur in parallel. The second port would be the read port so that functional units may read 
data and operate on it. 
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e) whereby said at least one functional unit is configured to execute a first program using 
instructions involving register operands, and said data assembly unit is configured to execute an 
intelligent caching program that causes data to be moved between the register files and the 
DRAM array in support of the first program. Clearly, the system of Fig. 1 would operate in such 
a manner. For instance, component 34 of Fig. 1 could execute a first program containing a 
multiplication operation of two operands (retrieved from the register file). And, the data 
assembly unit (Fig. 1, component 32) will perform loads which move data from memory to the 
register file in support of a program, i.e., the program will operate on data retrieved from 
registers which may have been loaded from the DRAM. 

61 . Referring to claim 22, Parady has taught a processor as described in claim 21 . Parady has 
further taught that the at least one functional unit executes instructions that exclusively include 
register operands. See Fig. 1, and note for instance that component 34 will execute multiply and 
divide instructions. These instructions typically include only register operands and are in the 
form of MULT R3, R2, Rl (multiply Rl and R2 and store the result in R3). 

62. Referring to claim 23, Parady has taught a processor as described in claim 21 . Parady has 
further taught that the data assembly unit is responsive to an instruction set that comprises a 
command to perform the parallel transfer data between a row of said DRAM array and a selected 
data register file. The data assembly unit (Fig. 1, component 32) is a load/store unit which is 
responsive to load/store instructions. Load instructions are used to load a register file, and an 
entire row will be loaded in parallel as each DRAM row would comprise multiple bits of data 
and each bit is written in parallel. 
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63. Referring to claim 25, Parady has taught a processor as described in claim 21 . Parady has 
not explicitly taught that the data assembly unit further comprises a row address register and an 
instruction set which comprises a command to perform arithmetic on said row address register. 
However, Official Notice is taken that row address registers (i.e., a register which holds a 
memory address) is well known and expected in the art. More specifically, registers may be used 
to hold memory addresses so that the system may achieve different types of addressing modes. 
Instead of specifying long memory addresses within the instruction, a register merely has to be 
specified and then the address is retrieved from the register. Therefore, it would have been 
obvious to one of ordinary skill in the art at the time of the invention to have a row address 
register in Parady. Furthermore, registers are operated upon. Consequently, commands are 
available to perform arithmetic in the row address register. 

64. Referring to claim 26, Parady has taught a processor as described in claim 25. Parady has 
further taught that said instruction set further comprises a command to precharge (activate) a row 
pointed to by said row address register. It is inherent that a row of DRAM must be precharged 
before it is read. Consequently, a simple load instruction, which reads a row of memory, will 
precharge the desired row. 

65. Referring to claim 28, Parady has taught a processor as described in claim 21 . Parady has 
further taught: 

a) at least one coupling between at least one of said functional units and at least one of said data 
assembly units. See Fig. 1, and note that all components are coupled by wires (bus). 

b) whereby information passed across said coupling is used to allow said data assembly unit to 
track the execution of said first program and execute load and store operations in support thereof. 
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Clearly, the load/store unit along with the decoder (which make up part of the data assembly 
unit) tracks execution of the program and when a load/store instruction is encountered, that 
operation will be performed to support the program. 

66. Referring to claim 33, Parady has taught in an intelligent cache based processor: 

a) at least one memory array comprising rows and columns of random access memory cells. See 
Fig. 2 and note that the system is coupled to main memory which inherently has rows and 
columns of memory cells. Parady has not explicitly taught that the memory array is a DRAM 
array. However, Official Notice is taken that DRAM and its advantages are well known and 
expected in the art. More specifically, DRAM is a very popular memory technology because of 
its high density and low price (in comparison to other memory such as SRAM). Therefore, it 
would have been obvious to one of ordinary skill in the art at the time of the invention to modify 
Parady' s main storage to be a DRAM array. 

b) at least one functional unit that executes a first program (Fig. 1, components 32-46), and a data 
assembly unit that executes an intelligent caching program in support of said first program 
(Fig. 1, component 32 and 14 at the very least make up a data assembly unit), a method of 
intelligent caching processing comprising: 

c) Parady has not taught in said data assembly unit, causing a parallel-loadable register file to be 
speculatively loaded in parallel from a DRAM row. However, Official Notice is taken that 
prefetching and its advantages are well known and expected in the art. More specifically, 
prefetching allows for the retrieval of data from memory before it is needed by an instruction. 
Consequently, when the instruction finally needs the data, it has already been fetched and is 
therefore ready immediately. This results in faster execution and higher throughput. As a result, 
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it would have been obvious to one of ordinary skill in the art at the time of the invention to 
modify Parady to prefetch from DRAM so that performance may be increased, 
d) in at least one functional unit, generating a conditional output and based on said condition, 
conditionally mapping said parallel-loadable register file to a set of architectural registers visible 
to said at least one functional unit. Note that in response to a load instruction being executed, a 
cache hit/miss signal will be generated. If a miss occurs then a thread switch will occur (see 
Parady' s abstract). If a switch occurs, then a currently inactive register file will become active 
and be visible to the functional units. 

67. Referring to claim 34, Parady has taught a method as described in claim 33. Parady has 
further taught that said conditionally mapping is initiated under control of said data assembly 
unit. Note that the data assembly unit comprises load/store unit 32 (Fig. 1). The load/store unit 
executes load instructions which ultimately cause the cache miss. Therefore, the mapping is 
initiated by the data assembly unit. 

68. Referring to claim 37, Parady has taught a method as described in claim 33. Parady has 
further taught: 

a) in said data assembly unit, causing a DRAM row to be speculatively precharged in support of 
said first program. Recall that it would have been obvious to prefetch in Parady. In order to 
prefetch, the DRAM row must be precharged or activated. And this is a speculative precharge 
because the data is being fetched ahead of time without knowing whether the program will use it 
or not. 

b) in said at least one functional unit, executing a command in said first program that causes a 
register value to be modified. See Fig. 1, components 32-46 and note that each of these units is 
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capable of modifying a register (for instance, component 34 may multiply two registers together 
and store the result in a register). 

c) storing at least said modified register value into said speculatively precharged DRAM row. 
From Fig. 1, it can be seen that load/store unit 32 would execute store instructions for storing 
register data into the DRAM. The result may be stored in the same row from which data was 
prefetched (speculatively precharged row) as it is a valid memory row just like every other valid 
memory row. 

69. Referring to claim 38, Parady has taught a method as described in claim 33. Parady has 
further taught: 

a) in said data assembly unit, causing a DRAM row to be speculatively precharged prior to the 
execution of at least one event that is to be executed in said first program. Recall that it would 
have been obvious to prefetch in Parady. The idea of prefetching is to retrieve data before an 
execution event requires the data. Consequently, the DRAM row must be precharged or 
activated before the execution event. And this is a speculative precharge because the data is 
being fetched ahead of time without knowing whether the program will use it or not. 

b) in said at least one functional unit, executing a command to write a word to at least one 
register in a second register set. See Fig. 1, components 32-46 and note that each of these units is 
capable of modifying a register (for instance, component 34 may multiply two registers together 
and store the result in a register). 

c) in said data assembly unit, causing at least a portion of said second register set to be written 
into said precharged DRAM row. From Fig. 1, it can be seen that load/store unit 32 would 
execute store instructions for storing register data into the DRAM. The result may be stored in 
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the same row from which data was prefetched (speculatively precharged row) as it is a valid 
memory row just like every other valid memory row. 

70. Referring to claim 39, Parady has taught a method as described in claim 33. Parady has 
further taught: 

a) in said at least one functional unit, writing an output to a register in an architectural register 
file. See Fig. 1, components 32-46 and note that each of these units is capable of modifying a 
register (for instance, component 34 may multiply two registers together and store the result in a 
register). 

b) in said data assembly unit, reading said output from said register and using said output as a 
control input in said intelligent caching program. From Fig. 1, it can be seen that load/store unit 
32 would execute store instructions for storing register data into the DRAM. Therefore, the 
register data is retrieved and then stored into DRAM. The store data controls which data is 
stored in DRAM. 

71 . Referring to claim 40, Parady has taught in an intelligent-cache based processor: 
a) the processor comprising: 

(i) at least one memory array comprising rows and columns of random access memory 
cells. See Fig. 2 and note that the system is coupled to main memory which inherently 
has rows and columns of memory cells. Parady has not explicitly taught that the memory 
array is a DRAM array. However, Official Notice is taken that DRAM and its 
advantages are well known and expected in the art. More specifically, DRAM is a very 
popular memory technology because of its high density and low price (in comparison to 
other memory such as SRAM). Therefore, it would have been obvious to one of ordinary 
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skill in the art at the time of the invention to modify Parady's main storage to be a 
DRAM array. 

(ii) at least first and second parallel-loadable register files. See Fig. 1 and Fig.3, 
components 48 and 50. Note that by having multiple write ports, they may be loaded in 
parallel. 

(iii) at least one functional unit that executes a first program and interacts with a set of 
architectural register locations. See Fig.l, and note that component 34 is a functional unit 
which would interact with registers (to perform multiplication and division on them). 

(iv) a data assembly unit that executes an intelligent caching program in support of said 
first program. See Fig. 1, component 32 and 14 and note that these components may be 
part of a data assembly unit. Loads and stores are executed by this unit to bring data in 
from cache/main memory and to send data to memory. 

b) a method of intelligent caching processing comprising: 

(i) in said at least one functional unit, executing a first group instructions, at least some of 
which have operands that correspond to said architectural register locations, said 
architectural register locations being mapped to said first parallel-loadable register file. 
See Fig. 1, and note that component 34 is a functional unit which would interact with 

' registers (to perform multiplication and division on them). The register file used will be 
the register file associated with the active thread. See column 2, lines 25-39, and the 
abstract. 

(ii) in said data assembly unit, monitoring at least a subset of bits generated by the 
execution of said first group of instructions, and in response thereto, causing said second 
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parallel-load register file to be parallel loaded from a row of said DRAM array. Note that 
when a load instruction is decoded and executed, an address and read signals would be 
monitored by the DRAM (also part of the assembly unit) and used to read data from the 
DRAM and load it into the appropriate register. 

(iii) in said data assembly unit, causing said second parallel-loadable register file to be 
mapped to said set of architectural register locations accessible by said at least one 
functional unit. From the abstract, note that when a load causes a cache miss, a thread 
will be switched, causing a new register file (and thread) to become active. This register 
file will now be visible to the functional units. 

(iv) in said at least one functional unit, executing a second group of instructions at least 
some of which have operands that correspond said architectural register locations. Again, 
the units in Fig. 1 (components 34-46, for instance) may execute a second group of 
instructions (from a second thread) which have operands in the register file. 

(v) whereby said second group of instruction cause data in said second parallel-loadable 
register file to be accessed. See Fig. 1, and note that if one instruction in the second group 
is a divide instruction, for instance, then the register file may be accessed to retrieve a 
dividend and a divisor so that the division may be performed. 

72. Claim 1 1 is rejected under 35 U.S.C. 103(a) as being unpatentable over Inagami, as 
applied above, in view of Hayashi, as applied above. 

73. Referring to claim 1 1, Inagami has taught in a processor comprising at least one memory 
array that comprises rows and columns of random access memory cells (Fig. 1, component 1), at 
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least one functional unit which executes a first program (Fig. 1, component 5), and a data 
assembly unit which executes a second program (Fig. 1, components 2-0, 2-1, etc.), said second 
program being tightly coupled with said first program, and whereby said data assembly unit is 
operative cause a plurality of data elements to be transferred between a memory row and one or 
more register files (Fig. 1, components VRO, VR1,. . . .VR7) that each include a parallel access 
port, a method of intelligent caching comprising: 

a) executing a first sequence of instructions on said at least one functional unit, said at least one 
functional unit operative to process data stored in said at least one register file. See column 4, 
line 63, to column 5, line 1. Note that arithmetic instruction would be executed here. 

b) executing a second sequence of instructions on said data assembly unit, said data assembly 
unit operative cause data to be transferred data between said at least one register file and said 
array. See Fig. 1 and note that data is transferred between registers and memory via the 
load/store pipes (data assembly unit). 

c) Inagami has not taught that said second sequence of instructions instructs said data assembly 
unit to speculatively prefetch data in parallel from said array in support of said first sequence of 
instructions. However, Hayashi has taught such a concept. See the title and claim 1 of Hayashi 
and note that vector data is prefetched and ultimately written to vector registers. Clearly, 
prefetching is beneficial because by bringing data into the system from memory before 
instructions requiring that data are executed, once the instruction does in fact need to execute, the 
data is available immediately, thereby preventing a time delay. Consequently, it would have 
been obvious to one of ordinary skill in the art at the time of the invention to modify Inagami to 
include prefetching capability. 
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d) Inagami has not explicitly taught that the memory array is a DRAM array. However, Official 
Notice is taken that DRAM and its advantages are well known and expected in the art. More 
specifically, DRAM is a very popular memory technology because of its high density and low 
price (in comparison to other memory such as SRAM). Therefore, it would have been obvious to 
one of ordinary skill in the art at the time of the invention to modify Inagami' s main storage to 
be a DRAM array. 



Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to David J. Huisman whose telephone number is (571) 272-4168. 
The examiner can normally be reached on Monday-Friday (8:00-4:30). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Eddie Chan can be reached on (571) 272-4162. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
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